Performance and Scalability of Broadcast in Spark

نویسنده

  • Mosharaf Chowdhury
چکیده

Although the MapReduce programming model has so far been highly successful, not all applications are well suited to this model. Spark bridges this gap by providing seamless support for iterative and interactive jobs that are hard to express using the acyclic data flow model pioneered by MapReduce. While benchmarking Spark, we identified that the default broadcast mechanism implemented in the Spark prototype is a hindrance toward its scalability. In this report, we implement, evaluate, and compare four different broadcast mechanisms (including the default one) for Spark. We outline the basic requirements of a broadcast mechanism for Spark and analyze each of the compared broadcast mechanisms under that guideline. Our experiments in high-speed, low-latency, and cooperative data center environments also shed light on characteristics of multicast and broadcast mechanisms in data centers in general.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Overview of Group Key Management Issues in IEEE 802.16e Networks

The computer industry has defined the IEEE 802.16 family of standards that will enable mobile devices to access a broadband network as an alternative to digital subscriber line technology. As the mobile devices join and leave a network, security measures must be taken to ensure the safety of the network against unauthorized usage by encryption and group key management. IEEE 802.16e uses Multica...

متن کامل

Scalability Potential of BWA DNA Mapping Algorithm on Apache Spark

This paper analyzes the scalability potential of embarrassingly parallel genomics applications using the Apache Spark big data framework and compares their performance with native implementations as well as with Apache Hadoop scalability. The paper uses the BWA DNA mapping algorithm as an example due to its good scalability characteristics and due to the large data files it uses as input. Resul...

متن کامل

Experimental Investigation on Hydrous Methanol Fueled HCCI Engine Using Spark Assisted Method

The present work investigates the performance and emission characteristics of hydrous methanol fuelled Homogeneous Charge Compression Ignition (HCCI) engine. In the present work a regular diesel engine has been modified to work as HCCI engine. Hydrous methanol is used with 15% water content in this HCCI engine and its performance and emission behavior is documented. A spark plug is used for ass...

متن کامل

Experimental Study of Performance of Spark Ignition Engine with Gasoline and Natural Gas

The tests were carried out with the spark timing adjusted to the maximum brake torquetiming in various equivalence ratios and engine speeds for gasoline and natural gas operations. In thiswork, the lower heating value of gasoline is about 13.6% higher than that of natural gas. Based on theexperimental results, the natural gas operation causes an increase of about 6.2% brake special fuelconsumpt...

متن کامل

Ddup - towards a deduplication framework utilising apache spark

This paper is about a new framework called DeduPlication (DduP). DduP aims to solve large scale deduplication problems on arbitrary data tuples. DduP tries to bridge the gap between big data, high performance and duplicate detection. At the moment a first prototype exists but the overall project status is work in progress. DduP utilises the promising successor of Apache Hadoop MapReduce [Had14]...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010